AITopics | recurrent class

Collaborating Authors

recurrent class

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

7749f9c0d5ff109231be21e910a3ced2-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 22:29:07 GMT

artificial intelligence, equation, machine learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)

Add feedback

Restless Bandits with Average Reward: Breaking the Uniform Global Attractor Assumption

Neural Information Processing SystemsFeb-9-2026, 08:47:04 GMT

We study the infinite-horizon restless bandit problem with the average reward criterion, in both discrete-time and continuous-time settings.

artificial intelligence, big data, data mining, (21 more...)

Neural Information Processing Systems

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Data Science > Data Mining > Big Data (0.66)

Add feedback

Stabilizing Fixed-Point Iteration for Markov Chain Poisson Equations

Xu, Yang, Aggarwal, Vaneet

arXiv.org Machine LearningFeb-3-2026

Poisson equations underpin average-reward reinforcement learning, but beyond ergodicity they can be ill-posed, meaning that solutions are non-unique and standard fixed point iterations can oscillate on reducible or periodic chains. We study finite-state Markov chains with $n$ states and transition matrix $P$. We show that all non-decaying modes are captured by a real peripheral invariant subspace $\mathcal{K}(P)$, and that the induced operator on the quotient space $\mathbb{R}^n/\mathcal{K}(P)$ is strictly contractive, yielding a unique quotient solution. Building on this viewpoint, we develop an end-to-end pipeline that learns the chain structure, estimates an anchor based gauge map, and runs projected stochastic approximation to estimate a gauge-fixed representative together with an associated peripheral residual. We prove $\widetilde{O}(T^{-1/2})$ convergence up to projection estimation error, enabling stable Poisson equation learning for multichain and periodic regimes with applications to performance evaluation of average-reward reinforcement learning beyond ergodicity.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

2602.00474

Country:

North America > United States (0.40)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)

Add feedback

Restless Bandits with Average Reward: Breaking the Uniform Global Attractor Assumption

Neural Information Processing SystemsOct-8-2025, 08:19:51 GMT

We study the infinite-horizon restless bandit problem with the average reward criterion, in both discrete-time and continuous-time settings.

disagreement period, ftva, nulla, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Data Science > Data Mining > Big Data (0.66)

Add feedback

7749f9c0d5ff109231be21e910a3ced2-Supplemental-Conference.pdf

Neural Information Processing SystemsAug-16-2025, 02:07:12 GMT

artificial intelligence, equation, machine learning, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)

Add feedback

Regret Analysis of Average-Reward Unichain MDPs via an Actor-Critic Approach

Ganesh, Swetha, Aggarwal, Vaneet

arXiv.org Machine LearningMay-27-2025

Actor-Critic methods are widely used for their scalability, yet existing theoretical guarantees for infinite-horizon average-reward Markov Decision Processes (MDPs) often rely on restrictive ergodicity assumptions. We propose NAC-B, a Natural Actor-Critic with Batching, that achieves order-optimal regret of $\tilde{O}(\sqrt{T})$ in infinite-horizon average-reward MDPs under the unichain assumption, which permits both transient states and periodicity. This assumption is among the weakest under which the classic policy gradient theorem remains valid for average-reward settings. NAC-B employs function approximation for both the actor and the critic, enabling scalability to problems with large state and action spaces. The use of batching in our algorithm helps mitigate potential periodicity in the MDP and reduces stochasticity in gradient estimates, and our analysis formalizes these benefits through the introduction of the constants $C_{\text{hit}}$ and $C_{\text{tar}}$, which characterize the rate at which empirical averages over Markovian samples converge to the stationary distribution.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

2505.19986

Country:

North America > United States > New York (0.04)
North America > United States > Maryland > Baltimore (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > India > Karnataka > Bengaluru (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Add feedback

The Number of Trials Matters in Infinite-Horizon General-Utility Markov Decision Processes

Santos, Pedro P., Sardinha, Alberto, Melo, Francisco S.

arXiv.org Artificial IntelligenceSep-23-2024

The general-utility Markov decision processes (GUMDPs) framework generalizes the MDPs framework by considering objective functions that depend on the frequency of visitation of state-action pairs induced by a given policy. In this work, we contribute with the first analysis on the impact of the number of trials, i.e., the number of randomly sampled trajectories, in infinite-horizon GUMDPs. We show that, as opposed to standard MDPs, the number of trials plays a key-role in infinite-horizon GUMDPs and the expected performance of a given policy depends, in general, on the number of trials. We consider both discounted and average GUMDPs, where the objective function depends, respectively, on discounted and average frequencies of visitation of state-action pairs. First, we study policy evaluation under discounted GUMDPs, proving lower and upper bounds on the mismatch between the finite and infinite trials formulations for GUMDPs. Second, we address average GUMDPs, studying how different classes of GUMDPs impact the mismatch between the finite and infinite trials formulations. Third, we provide a set of empirical results to support our claims, highlighting how the number of trajectories and the structure of the underlying GUMDP influence policy evaluation.

gumdp, markov chain, recurrent class, (14 more...)

arXiv.org Artificial Intelligence

2409.15128

Country:

South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.46)

Add feedback

On Mechanism Underlying Algorithmic Collusion

Xu, Zhang, Zhao, Wei

arXiv.org Artificial IntelligenceSep-2-2024

Two issues of algorithmic collusion are addressed in this paper. First, we show that in a general class of symmetric games, including Prisoner's Dilemma, Bertrand competition, and any (nonlinear) mixture of first and second price auction, only (strict) Nash Equilibrium (NE) is stochastically stable. Therefore, the tacit collusion is driven by failure to learn NE due to insufficient learning, instead of learning some strategies to sustain collusive outcomes. Second, we study how algorithms adapt to collusion in real simulations with insufficient learning. Extensive explorations in early stages and discount factors inflates the Q-value, which interrupts the sequential and alternative price undercut and leads to bilateral rebound. The process is iterated, making the price curves like Edgeworth cycles. When both exploration rate and Q-value decrease, algorithms may bilaterally rebound to relatively high common price level by coincidence, and then get stuck. Finally, we accommodate our reasoning to simulation outcomes in the literature, including optimistic initialization, market design and algorithm design.

collusion, discount factor, q-value, (17 more...)

arXiv.org Artificial Intelligence

2409.01147

Country:

Asia > China (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > New York > Suffolk County > Stony Brook (0.04)

Genre: Research Report (0.64)

Industry:

Banking & Finance (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.98)

Add feedback

Improved and Generalized Upper Bounds on the Complexity of Policy Iteration

Neural Information Processing SystemsMar-13-2024, 16:30:41 GMT

Given a Markov Decision Process (MDP) with n states and m actions per state, we study the number of iterations needed by Policy Iteration (PI) algorithms to converge to the optimal "-discounted optimal policy. We consider two variations of PI: Howard's PI that changes the actions in all states with a positive advantage, and Simplex-PI that only changes the action in the state with maximal Ïadvantage. 1 We 2Ìshow that 1 Howard's 1 PI 22 terminates

howard, iteration, simplex-pi, (15 more...)

Neural Information Processing Systems

Country:

Europe > France (0.04)
North America > United States > New York (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Add feedback

Restless Bandits with Average Reward: Breaking the Uniform Global Attractor Assumption

Hong, Yige, Xie, Qiaomin, Chen, Yudong, Wang, Weina

arXiv.org Machine LearningDec-10-2023

We study the infinite-horizon Restless Bandit problem with the average reward criterion, under both discrete-time and continuous-time settings. A fundamental goal is to design computationally efficient policies that achieve a diminishing optimality gap as the number of arms, $N$, grows large. Existing results on asymptotic optimality all rely on the uniform global attractor property (UGAP), a complex and challenging-to-verify assumption. In this paper, we propose a general, simulation-based framework, Follow-the-Virtual-Advice, that converts any single-armed policy into a policy for the original $N$-armed problem. This is done by simulating the single-armed policy on each arm and carefully steering the real state towards the simulated state. Our framework can be instantiated to produce a policy with an $O(1/\sqrt{N})$ optimality gap. In the discrete-time setting, our result holds under a simpler synchronization assumption, which covers some problem instances that violate UGAP. More notably, in the continuous-time setting, we do not require any additional assumptions beyond the standard unichain condition. In both settings, our work is the first asymptotic optimality result that does not require UGAP.

artificial intelligence, big data, data mining, (21 more...)

arXiv.org Machine Learning

2306.00196

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback